Modeling The Language Assessment Process And Result: Proposed Architecture For Automatic Oral Proficiency Assessment
نویسندگان
چکیده
We outline challenges for modeling human language assessment in automatic systems, both in terms of the process and the reliability of the result. We propose an architecture for a system to evaluate learners of Spanish via the Computerized Oral Proficiency Instrument, to determine whether they have 'reached' or 'not reached' the Intermediate Low level of proficiency, according to the American Council on the Teaching of Foreign Languages (ACTFL) Speaking Proficiency Guidelines. Our system divides the acoustic and non-acoustic features, incorporating human process modeling where permit ted by the technology and required by the domain. We suggest machine learning techniques applied to this type of system permit insight into yet unarticulated aspects of the human rating process. 1 I n t r o d u c t i o n Computer-mediated language assessment appeals to educators and language evaluators because it has the potential for making language assessment widely available with minimal human effort and limited expense. Fairly robust results (n '~ 0.8) have been achieved in the commercial domain modeling the human rater results, with both the Electronic Essay Rater (erater) system for written essay scoring (Burstein et al., 1998), and the PhonePass pronunciation assessment (Ordinate, 1998). There are at least three reasons why it is not possible to model the human rating process. First, there is a mismatch between what the technology is able to handle and what people manipulate, especially in the assessment of speech features. Second, we lack a wellarticulated model of the human process, often characterized as holistic. Certain assessment features have been identified, but their relative importance is not clear. Furthermore, unlike automatic assessments, human raters of oral proficiency exams are trained to focus on competencies, which are difficult to enumerate. In contrast, automatic assessments of spoken language fluency typically use some type of error counting, comparing duration, silence, speaking rate and pronunciation mismatches with native speaker models. There is, therefore, a basic tension within the field of computer-mediated language assessment, between modeling the assessment process of human raters or achieving comparable, consistent assessments, perhaps through different means. Neither extreme is entirely satisfactory. A spoken assessment system that achieves human-comparable performance based only, for example, on the proportion of silence in an utterance would seem not to be capturing a number of critical elements of language competence, regardless of how accurate the assessments are. Such a system would also be severely limited in its ability to provide constructive feedback to language learners or teachers. The e-rater system has received similar criticism for basing essay assessments on a number of largely lexical features, rather than on a deeper, more humanstyle rating process. Thirdly, however, even if we could articulate and model human performance, it is not clear that we want to model all aspects of the human rating process. For example, human performance varies due to fatigue. Transcribers often inadvertently correct examinees' errors of omitted or incorrect articles, conjugations, or affixes. These mistakes are a natural effect of a cooperative listener; however, they result in an over-optimistic assessment of the speaker's actual proficiency. We arguably do not wish to build this sort of cooperation into an automated
منابع مشابه
Modeling the language assessment process and result : Proposed architecture for an automatic oral pro ciencyassessmentGina -
We outline challenges for modeling human language assessment in automatic systems, both in terms of the process and the reliability of the result. We propose an architecture for a system to evaluate examinees via the Computerized Oral Proociency Instrument, to determine whether they havèreached' or`not reached' the Intermediate Low level of proociency, according to the American Council on the T...
متن کاملStudents’ Oral Assessment Considering Various Task Dimensions and Difficulty Factors
This study investigated students’ oral performance ability accounting for various oral analytical factors including fluency, lexical and structural complexity and accuracy with each subcategory. Accordingly, 20 raters scored the oral performances produced by 200 students and a quantitative design using a MANOVA test was used to investigate students’ score differences of various levels of langua...
متن کاملThe Impact of Raters’ and Test Takers’ Gender on Oral Proficiency Assessment: A Case of Multifaceted Rasch Analysis
The application of Multifaceted Rasch Measurement (MFRM) in rating test takers’ oral language proficiency has been investigated in some previous studies (e.g., Winke, Gass, & Myford, 2012). However, little research so far has ever documented the effect of test takers’ genders on their oral performances and few studies have investigated the relationship between the impact of raters’ gender on th...
متن کاملEnglish and Non English major Teachers’ Assessment of Oral Proficiency: a case of Iranian Maritime English Learners
Speaking assessment is still construed as a complicated, under-researched process from the vantage point of tasks and rater characteristics. The present study aimed at investigating if and how English Major and none English Major teachers differ in their perception of the construct of oral proficiency while assessing learners’ L2 oral proficiency. To this end, 38 male and female non-native EFL...
متن کاملDeveloping EFL Learners' Oral Proficiency through Animation-based Instruction of English Formulaic Sequences
The current pretest-posttest quasi-experimental study attempts, firstly, to probe the effects of teaching formulaic sequences (FSs) on the second or foreign language (L2) learners' oral proficiency improvement and secondly, to examine whether teaching FSs through different resources (i.e. animation vs. text-based readings) have any differentially influential effects in augmenting L2 l...
متن کامل